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Abstract 

Implementation of Ambisonic reproduction systems is 
limited by the number and placement of the loudspeak¬ 
ers. In practice, real-world systems tend to have in¬ 
sufficient loudspeaker coverage above and below the 
listening position. Because the localization experi¬ 
enced by the listener is a nonlinear function of the 
loudspeaker signals it is difficult to derive suitable de¬ 
coders analytically. As an alternative, it is possible to 
derive decoders via a search process in which analytic 
estimators of the localization quality are evaluated at 
each search position. We discuss the issues involved 
and describe a set of tools for generating optimized 
decoder solutions for irregular loudspeaker arrays and 
demonstrate those tools with practical examples. 
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1 Introduction 

Ambisonics is a versatile surround sound record¬ 
ing and reproduction system. One of the attrac¬ 
tions is that the transmission format is indepen¬ 
dent of the loudspeaker layout. However, this 
means that each playback system needs a custom 
decoder that is matched to the loudspeaker ar¬ 
ray. The decoder creates the loudspeaker signals 
from the transmission signals. Ambisonics theory 
provides simple encapsulations of high- and low- 
frequency auditory localization that can be used 
to design decoders, as well as theorems that ease 
the design of decoders for regular polygonal and 
polyhedral loudspeaker arrays. 

In earlier papers, the present authors have dis¬ 
cussed the design and testing of first-order de¬ 
coders for regular horizontal loudspeaker layouts 
[Heller et ah, 2008] as well as the use of nonlin¬ 
ear optimization to design decoders for ITU 5.1 


arrays [Heller et ah, 2010]. In this paper, we ex¬ 
tend that work to full periphonic (3-D) arrays and 
higher-order Ambisonics (HOA). The techniques 
are implemented as a MATLAB [2011] and GNU 
Octave [Gnu, 2011] toolkit that makes use of the 
NLOpt library [Johnson, 2011] to perform the op¬ 
timization. 

We use the term decoder to mean the configu¬ 
ration for a decoding engine that does the actual 
signal processing. Examples are Ambdec [Adri- 
aensen, 2011] that operates in real time, as well 
as an offline decoder we have implemented as part 
of this toolkit. 1 

In this paper, we use bold roman type to denote 
vectors, italic type to denote scalars, and sans serif 
type to denote signals. A scalar with the same 
name as a vector denotes the magnitude of the 
vector. A vector with a circumflex (“hat”) is a 
unit vector, so, for example, i*e = te /te- 

We start with a discussion of the design process 
and the tradeoffs involved, then the specifics of the 
optimization process, and finally results of two ar¬ 
rays, a third-order decoder for the 22-loudspeaker 
CCRMA array, and a second-order decoder for the 
12-loudspeaker 30° tri-rectangle array. 

2 Designing Ambisonic Decoders 

Ambisonics represents a sound field with a group 
of signals that are proportional to spherical har¬ 
monics. The original Ambisonic systems were first 
order only, but more recently, higher-order sys- 

1 Another important function of an Ambisonic decoder is 
to provide near-field compensation. This compensates for 
the curvature of the wavefronts due to the finite distance 
to the loudspeaker and is strictly a function of distance 
of the speaker from the center of the array and the order 
of reproduction. Ambdec and the offline decoder in this 
toolkit provide such filters and they will not be discussed 
further in this paper. 



terns have been implemented. In first-order Am- 
bisonics the zeroth-order component represents 
the sound pressure, and the three first-order com¬ 
ponents represent the acoustic particle velocity. 
If these components are reproduced exactly, then 
the sound will be correct at the center. How¬ 
ever, it is not possible to get the first-order com¬ 
ponents correct except at a single point and not 
practical to get them correct at higher frequen¬ 
cies, where the wavelengths become smaller than 
the size of the human head. The task of the de¬ 
coder is to create the best perceptual impression 
that the soundfield is being reproduced accurately 
given the loudspeaker array being used. 

In practical terms, the following are necessary: 

• Constant amplitude gain for all source direc¬ 
tions 

• Constant energy gain for all source directions 

• At low frequencies, correct reproduced wave- 
front direction and velocity 

• At high frequencies, maximum concentration 
of energy in the source direction 

• Matching high- and low-frequency perceived 
directions 

These criteria may, themselves, have different 
interpretation or importance depending on the 
source material and the intended use. We can 
identify three distinct types of program: 

• Natural recordings made with a first-order 
soundfield microphone. 

• Natural recordings made with higher order 
microphones. As of this writing, such mi¬ 
crophones are just becoming available com¬ 
mercially, but practical constraints will mean 
that these are still first order at lower fre¬ 
quencies. 

• Artificial recordings. First order as well as 
Higher Order Ambisonic (HOA) program ma¬ 
terial. 

The first case is Ambisonic’s greatest strength. 
Good first-order Ambisonic reproduction is prob¬ 
ably the closest to recreating a virtual sound envi¬ 
ronment, whether the buzz of a busy Asian mar¬ 
ketplace or the sound of a concert in a good hall 
in your living room. It will most likely be used 


to create realistic atmosphere even if more precise 
methods like HOA are used for special sounds. 

To preserve this advantage requires the preser¬ 
vation of a good facsimile of the diffuse field. En¬ 
ergy gain that varies with direction and “bunch¬ 
ing” of directions, particularly in the horizontal 
plane, are all detrimental, as is “speaker detent” 
where individual loudspeakers draw attention to 
themselves. 

2.1 Auditory Localization 

Due to the range of wavelengths involved, the 
human auditory localization mechanism utilizes 
different directional cues over different frequency 
regimes. At low frequencies, localization depends 
on the detection of Interaural Time Differences 
(ITDs), but at high frequencies there is an am¬ 
biguity because a human head is multiple wave¬ 
lengths across above about 1 kHz. For this rea¬ 
son, localization switches abruptly, depending on 
Interaural Level Differences (ILDs) above that fre¬ 
quency. One way to predict localization would be 
to use Head Related Transfer Functions (HRTFs) 
to calculate the actual ear signals of a listener, but 
this turns out to be computationally difficult and 
would vary from listener to listener. 

Gerzon developed a series of metrics for pre¬ 
dicting localization that are simpler than using 
the HRTFs [Gerzon, 1992]. The simplest of these 
metrics are the velocity localization vector, rv, 
and the energy localization vector, te- The direc¬ 
tion of each indicates the direction of the expected 
localization perception, while the magnitude indi¬ 
cates the quality of the localization. In natural 
hearing from a single source, the magnitude of 
each vector should be exactly 1, and the direc¬ 
tion of the vectors is the direction to the source. 
It should be noted that, while rv is proportional 
to the physical quantity of the acoustic particle 
velocity, te is an abstract construct. 2 

Following Gerzon [1992], the pressure (ampli¬ 
tude gain), P, and total energy gain, P, are 

P = it G i (!) 

2=1 

2 Note that these metrics are not specific to Ambisonics; 
they can be used to predict the quality of the phantom 
images produced by any multispeaker reproduction system, 
regardless of the panning laws used, including plain old 
two-channel stereo. Gerzon shows this for several well- 
known stereo phenomena [Gerzon, 1992]. 



n 

E = J^iGiGi*) (2) 

2=1 

The magnitude and direction of the velocity vec¬ 
tor, ry and rV, at the center of an array with n 
loudspeakers is 

1 n 

r v rv = — Re ^ G,u : (3) 

2=1 

whereas the magnitude and direction of the energy 
vector, te and i*e, are computed by 

n 

rE*E = -jjj y^(Gi(V)ui (4) 

2=1 

where the Gi are the (possibly complex) gains 
from the source to the z-th loudspeaker, iii is a 
unit vector in the direction of the loudspeaker, 
and G* is the complex conjugate of Gi. 

The velocity vector points in the same direction 
and is proportional to the acoustic particle veloc¬ 
ity. It has been shown that the velocity vector pre¬ 
dicts the ITDs very accurately [Benjamin et ah, 
2010]. The energy vector predicts the ILDs, but 
in practice it is not possible to get te — 1 unless 
the sound is coming from just one loudspeaker. 
This is representative of a pervasive problem in 
multichannel sound reproduction. The maximum 
average value of te that can be obtained for a 
given Ambisonic order is shown in Figure 1. The 
formulas to compute these are given in Appendix 
A. 

Because different sets of gains are needed to 
satisfy the low- and high-frequency models, many 
ambisonic decoders split the audio into two bands, 
apply different decoder matrices, and then recom¬ 
bine to produce the loudspeaker signals. 3 Daniel 
has suggested that a three-band decoder may pro¬ 
vide better reproduction under some listening con¬ 
ditions [Daniel, 2001]. This remains an open ques¬ 
tion at this time. 

2.2 Computing the Low-Frequency 
Matrix 

The low-frequency matrix provides gains from 
each channel of the ambisonic program material 

3 This places certain constraints on the phase response 
of the band splitting filters. We discuss the design and 
implementation of suitable filters in Appendix B of [Heller 
et ah, 2008] and note that the filters in Ambdec meet these 
requirements. 



Figure 1: Maximum average te depending on or¬ 
der and type, “matching” and “max ve” refer to 
the decoder matrices described in Sections 2.2 and 
2.3, respectively. 

to each loudspeaker that are needed to optimize 
localization as predicted by the velocity localiza¬ 
tion vector, rv- Numerous authors have provided 
derivations of the low-frequency solution for a 
given loudspeaker array, and thus a number of dif¬ 
ferent terms are used to refer to it, including “ve¬ 
locity”, “matching”, “basic”, “exact”, “mode match¬ 
ing”, “re-encoding” and so forth. 

In practice, these reduce to projecting (or en¬ 
coding) the loudspeaker directions onto the se¬ 
lected spherical harmonic basis set, 4 assembling 
these vectors into an array, and computing the 
Moore-Penrose pseudoinverse of the array [Weis- 
stein, 2008]. Examples of this can be found in 
Appendix A of [Heller et ah, 2008]. In general, 
there are an infinite number of solutions and this 
procedure provides the solution with the minimum 
L2-norm (i.e., the least-squares fit), which has the 
desirable property of requiring the minimum total 
radiated power. 5 

4 The toolkit is neutral as to the conventions for com¬ 
ponent ordering and normalization. These conventions are 
encapsulated in a single function. The current implemen¬ 
tation supports the Furse-Malham set [Malham, 2003], but 
others can be added easily. 

5 Recently, some authors, drawing upon compressive 
sending theory, have suggested that the L 1-norm may be 
more suitable [Wabnitz et ah, 2011; Zotter et ah, 2012]. Ll- 
norm minimizes the sum of absolute errors. Compared to 
least-squares, it allows larger maximum errors in exchange 
for more zero errors. 









Except in the case of degenerate configurations, 
where all the loudspeakers lie in the null of one or 
more of the spherical harmonics, this procedure 
will result in a decoder matrix that satisfies the 
low-frequency localization criteria exactly; how¬ 
ever, it may utilize a great deal of power to get 
them correct in directions where there is a large 
angular separation between the loudspeakers in 
the array. This will result in low te values in 
those directions. As we shall see, except in the 
case of regular polyhedra and polygons, it is im¬ 
possible to fully satisfy all the ambisonic criteria 
simultaneously. This implies that while the am¬ 
bisonic transmission format is independent of the 
loudspeaker array, not all loudspeaker array ge¬ 
ometries perform equally well. 

2.3 Computing the High-Frequency 
Matrix 

The high-frequency matrix provides gains from 
each channel of the ambisonic program material 
to each loudspeakers that are needed to optimize 
localization as predicted by the energy localization 
vector, te- Gerzon proved two theorems for first- 
order reproduction, the polygonal decoder theo¬ 
rem and the diametric decoder theorem. They 
state that in an array with a minimum of four 
loudspeakers for 2-D and six speakers for 3-D, 
where the loudspeakers are spaced in equal angles 
or in diametrically opposed pairs, te is guaran¬ 
teed to point in the same direction as rv- The 
polygonal decoder theorem also holds for higher- 
order Ambisonic reproduction, provided there is 
an adequate number of loudspeakers in the ar¬ 
ray to support the desired order. This simplifies 
the task of designing the high-frequency matrix to 
that of selecting the gain for each order such that 
the overall magnitude of te is maximized. For 
first-order decoders, Gerzon provided the values 
of ^ for horizontal arrays and ^ for periphonic 
arrays. Daniel derived general formulas for these 
gains [Daniel, 2001], which are given in Tables 1 
and 2. (See Appendix A for programs that com¬ 
pute the values in these tables.) 

As we will see in the example in Section 2.5, 
once the array deviates from having equal angles, 
there is no longer a guarantee that te and rv 
point in the same direction or that there is a single 
set of gains that maximize te in every direction. 
Because of this, we must trade off the various cri¬ 


teria and due to the nonlinear nature of the crite¬ 
ria, numerical optimization is needed to compute 
the solutions, which will be discussed in Section 
3. 

2.4 Merging the LF and HF Matrix 

The existence of different optimum decoder coef¬ 
ficients for optimum rv and te would typically 
mean having to make a choice or compromise be¬ 
tween the two. In this case, however, both can be 
had. The decoder that optimizes rv can be used 
at low frequencies and the decoder that maximizes 
te can be used at high frequencies, by the simple 
expedient of using filters to cross over between the 
two. This is typically done at around 400 Hz. 

Because the higher-order components are re¬ 
duced in order to maximize te, this causes a 
reduction in the total signal level of the high- 
frequency decoder outputs, and thus a reduction 
in the high frequencies heard by the listener. The 
gains that maximize te specify the relationship 
among the signals of different order, but not how 
that gain should be apportioned between high- 
frequency cuts and low-frequency boosts. There 
are three possibilities: 

1) Preservation of the amplitude. That is, sim¬ 
ply use the gains produced by the optimizer or 
those given in Tables 1 and 2. 

2) Preservation of the root-mean-square (RMS) 
level. This is what Gerzon [1980] suggests 
and is what is implemented in older analog de¬ 
coders. 

3) Conservation of the total energy. Daniel [2001] 
suggests this, and the configuration files in¬ 
cluded with Ambdec follow this recommenda¬ 
tion. This method results in more high fre¬ 
quencies with more speakers. 

The calculations involved are given in Appendix 
B. In listening tests, we have found that preserva¬ 
tion of the RMS level works well for small arrays. 
We have also found that using the conservation 
of energy approach on large 3-D arrays results in 
overemphasizing high frequencies and near-head 
imaging artifacts and nulls. In practice, we the set 
the LF/HF balance by ear, comparing the balance 
of the two-band decoder to that of a single-band 
te -max decoder. More work is needed to find a 
procedure for this that does not involve tuning by 
ear. 



Order 

Max te 

Gains 

i 

0.707107 

1, 0.707107 

2 

0.866025 

1, 0.866025, 0.5 

3 

0.92388 

1, 0.92388, 0.707107, 0.382683 

4 

0.951057 

1, 0.951057, 0.809017, 0.587785, 0.309017 

5 

0.965926 

1, 0.965926, 0.866025, 0.707107, 0.5, 0.258819 


Table 1: Per-order gains for max-rg decoding with 2-D regular polygonal arrays. 


Order 

Max te 

Gains 

i 

0.57735 

1, 0.57735 

2 

0.774597 

1, 0.774597, 0.4 

3 

0.861136 

1, 0.861136, 0.612334, 0.304747 

4 

0.90618 

1, 0.90618, 0.731743, 0.501031, 0.245735 

5 

0.93247 

1, 0.93247, 0.804249, 0.62825, 0.422005, 0.205712 


Table 2: Per-order gains for max-r^ decoding with periphonic regular polyhedral arrays. 


2.5 Selection of a speaker array 

Due to symmetry, regular loudspeaker arrays have 
the advantage of uniformity in the localization 
predictors rv and rg. As noted above, practi¬ 
cal difficulties usually prevent the attainment of a 
completely regular array. There will be a tendency 
for te to be greater in the directions where the an¬ 
gular density of loudspeakers is greater, and less 
in the directions where there are few loudspeak¬ 
ers and te will tend to point in the directions of 
concentrations of loudspeakers. 

It should be noted at this point that it is im¬ 
possible to get to be larger in the direction be¬ 
tween loudspeakers than the value achieved sim¬ 
ply by driving the loudspeakers nearest to the gap 
equally. This means that the best that can be 
achieved by an Ambisonic decoder is to have a 
smooth transition between areas where the perfor¬ 
mance is good (large number of loudspeakers, high 
magnitude of te and te points in the intended di¬ 
rection) and areas where the performance is less 
good (fewer loudspeakers, te has small magnitude 
and points in an incorrect direction). As such, we 
must be careful in choosing the decoder parame¬ 
ters so that the performance in the good directions 
is good enough, and the performance in the poor 
directions is not too bad. 

A simple example of a four-speaker array will 
illustrate these difficulties. A square horizontal 
array has a basic decoder solution of 
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This gives exact recovery of the pressure and 
velocity at the center of the array: |rv| = 1 and 
points in the intended direction. But because 
the angular separation of the loudspeakers is 90°, 
rE — §• However, if we investigate what happens 
as the ratio of pressure (W) and velocity (X, Y 
and Z) is varied, it develops that te is maximum 
for the case where the first-order components are 
reduced to ^ of their original value. This gives 
a magnitude of te of 

If the square is replaced with a rectangle with 
an aspect ratio of : 1, the front and rear loud¬ 
speakers now subtend an angle of 60° and the side 
loudspeakers subtend an angle of 120°. This re¬ 
duces at the sides but increases it in the front, 
relative to a square. If the same gain as derived for 
the square (^ for the first-order components) is 
applied, then there is a substantial improvement 
in to the sides, and a very tiny decrease in te 
in the front. This is shown in Figure 2. 

But is this the “optimum”? Figure 3 shows that 
if we further vary the ratio of the zero- and first- 
order components it develops that evaluated 
at the sides, is a maximum at a different ratio. 

It is thus possible to maximize te in front or at 






Figure 2: Locus of te for a rectangular array, 
matching and “max te” decoders. 
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velocity/pressure 

Figure 3: te and the energy, L, as a function of 
the ratio of the first-order to zero-order scaling. 

the sides, but not both at once. One might wish 
to use a different decoder depending on whether 
sound images are expected at the front, or on the 
sides, or a decoder that gives a compromise be¬ 
tween the two. 

Thus far, only the quality and direction of the 
localization have been discussed. There is also an 
effect on the loudness of the sound, depending on 
direction. If the loudspeaker array is irregular, 
then the solution to recover pressure and velocity 
results in an increase in energy in the directions 
where the angular spacing is greatest. This results 
in an increase in reproduced loudness in those di¬ 
rections. 


For the previous example of a rectangular array 
with a : 1 aspect ratio, the ratio of the energy 
in the forward direction to the energy at the sides 
was calculated and is also plotted in Figure 3. ry 
obtains its correct value of 1 in all directions and 
the pressure response is also omnidirectional. At 
higher frequencies, where the “max-r^” decoder is 
in effect, te is maximized for front and back direc¬ 
tions (where the speakers are closer together). At 
these frequencies, sounds from the sides (perceived 
as “energy”) are 2.4 dB louder. This is a pervasive 
problem for irregular arrays and will be addressed 
in greater detail below. The simple a/ 3 : 1 rectan¬ 
gle as above is used widely and is known to give 
good results. On the other hand, listening tests in¬ 
dicate that the 5.5 dB energy imbalance exhibited 
by some first-order decoders for ITU 5.1 arrays 
is too large. From this, we propose 3 dB varia¬ 
tion in “energy” with horizontal direction as the 
maximum imbalance acceptable. 

2.5.1 Discussion of compromises of 
speaker arrays 

The selection of a loudspeaker array for Ambisonic 
reproduction is subject to a number of compro¬ 
mises, notably the space available to house the 
array and the budget for purchasing loudspeakers. 
It may be that the array already exists, in which 
case the decoder design task is one of selecting 
a decoder design that provides the best audible 
performance. In other situations, however, the 
design of the array has not been fixed although 
the number of loudspeakers may have been. In 
that case, there is substantial latitude to trade off 
between high-order performance horizontally and 
periphonic performance. 

3 Optimizer 

As noted in Section 2.5, in an irregular array, sim¬ 
ply scaling the LF and HF matrices does not re¬ 
sult in rv and te pointing in the same direction; 
hence, the design procedure becomes somewhat 
more complex. 

Because the key psychoacoustic criteria for good 
decoder performance are nonlinear functions of 
the speaker signals, we utilize numerical optimiza¬ 
tion techniques. To do this, a single objective 
function is formulated that takes as input the 
decoder matrix and produces a single figure of 
merit that decreases as the decoder performance 
improves. The nonlinear optimization algorithm 

















will then try different sets of matrix elements, at¬ 
tempting to arrive at the lowest value possible. 
Because there are a number of criteria, we use 
the weighted sum to provide an overall figure of 
merit. A user can adjust the weights to set the 
relative importance of the different criteria, say 
uniform energy gain (loudness) versus angular ac¬ 
curacy. In addition, each test direction can have 
its own set of weights, so that, for example, an¬ 
gular accuracy can be emphasized for the front, 
while uniform energy gain is emphasized in other 
directions. This might be the preferred configura¬ 
tion for classical music recorded in a reverberant 
performance hall. On the other hand, environ¬ 
mental recordings made outdoors have very lit¬ 
tle diffuse content, so overall angular accuracy is 
more important. Another application of direction 
weightings is in highly asymmetrical arrays, such 
as a dome, where few speakers are below the lis¬ 
tener. In this case, we expect poor performance in 
those directions, so they are deemphasized when 
computing the objective function. 

We have employed the NLOpt library for non¬ 
linear optimization [Johnson, 2011]. NLOpt pro¬ 
vides a common application programming inter¬ 
face (API) for a collection of nonlinear optimiza¬ 
tion techniques. In particular, it supports a num¬ 
ber of “derivative free” optimization algorithms, 
which are well suited to the current application 
where the objective function is the result of a com¬ 
putation, rather than an analytic function. 

An earlier version of the optimizer that was lim¬ 
ited to first-order horizontal arrays was written 
in C++ [Heller et ah, 2010]. To extend that to 
higher-order and periphonic arrays required a sig¬ 
nificant rewrite, so an initial prototype was writ¬ 
ten in MATLAB, with plans to recode in C++. 
Because the bulk of the computation is matrix 
multiplication, which is handled by highly opti¬ 
mized code in MATLAB, it turned out that the 
execution speed was almost as fast as the orig¬ 
inal C++ version, so we abandoned plans for 
the rewrite. To make the code widely usable, 
it was kept compatible with GNU Octave. The 
key change is that GNU Octave does not support 
nested functions, so a number of variables need to 
be declared global to make them accessible to the 
objective function. 


3.1 Optimization Criteria 

For each test direction, the following are com¬ 
puted: amplitude gain, P, energy gain, E : the 
velocity localization vector, rv, and the energy 
localization vector, te- From these, we compute 
the pairwise angles between the test direction, rV, 
and Fe- These are summarized with the following 
figures of merit: deviation of amplitude gain from 
1 along the x-axis, minimum, maximum, and RMS 
values of amplitude gain, energy gain, magnitude 
of rv, magnitude of te, and the pairwise angular 
deviations. 

It is important that the criteria are “well be¬ 
haved” near zero, so as not to trigger oscillating 
behavior in the optimizer. They should be contin¬ 
uous and have first derivatives. In practical terms, 
absolute value and thresholds should not be used; 
squaring can be used for the former and the expo¬ 
nential function for the latter cases. 

Finally, directional weightings are applied to 
each criteria and then an overall weighted sum 
produces the single figure of merit for that partic¬ 
ular configuration. 

3.2 Test Directions 

As mentioned in the previous section, each candi¬ 
date set of parameters is evaluated from a number 
of directions. For 2-D speaker arrays, 180 or 360 
evenly spaced directions are often used [Wiggins 
et ah, 2003; Moore and Wakefield, 2008]. For 3-D 
arrays, the situation is more complex because no 
more than 20 points can be distributed uniformly 
on a sphere (a dodecahedron). 

Lebedev-Laikov quadrature defines sets of 
points on the unit sphere and weights with the 
property that they provide exact results for inte¬ 
gration of the spherical harmonics [Lebedev and 
Laikov, 1999]. The current implementation pro¬ 
vides a function that returns Lebedev grids of 
points and corresponding weights for as many as 
5810 directions. Our current experiments have 
used a grid with 2702 points, which corresponds 
roughly to 3°. The toolkit also has functions pro¬ 
viding 2-D and 3-D grids that are sampled in uni¬ 
form azimuth and elevation increments, which are 
useful for visualization of the results. 

3.3 Optimization Behavior 

As part of the optimization setup, the user sup¬ 
plies a set of stopping criteria. This can be spec¬ 
ified as a threshold on the absolute and relative 



changes in the parameters and/or the objective 
function, as well as a maximum running time and 
maximum number of iterations. The default val¬ 
ues in the current implementation are 1 x 10 -7 for 
the parameters and objective function. 

For small 2-D arrays (say, 12 to 24 parameters), 
the optimizer typically converges in less than 1 
minute, examining 40,000 to 1,500,000 configura¬ 
tions. For large high-order arrays (say, 200 to 
400+ parameters), it typically converges in less 
than an hour. These timings were done with Oc¬ 
tave version 3.2.4-atlas on a 2.66 GHz Intel Core i7 
with 8 GB of memory. The bulk of the computa¬ 
tion comprises matrix multiplications and is there¬ 
fore suitable for parallel implementations. The 
timings in MATLAB were approximately 2x faster 
than Octave since it can make use of the multiple 
cores in the i7 processor. 

With large optimization problems, using a lo¬ 
cal optimization algorithm and providing an ini¬ 
tial solution that is near to the optimum is im¬ 
portant for reliable convergence. The toolkit cur¬ 
rently supports three strategies: 

• Using the low-frequency solution modified 
with the per-order gains that would provide 
the max-r^ solution for a uniform array. 

• “Musil Design” where additional “virtual” 
loudspeakers are inserted into the array to 
make the spacing more uniform, and hence 
more suited to a pseudo-inverse solution. Af¬ 
ter the optimization is complete, the signals 
for the virtual speakers are either ignored or 
distributed to the adjacent speakers [Zotter 
et ah, 2010]. 

• A hierarchical approach, decomposing the op¬ 
timization by establishing a solution for each 
order consecutively, freezing the individual 
coefficients for orders below the current one, 
but allowing an overall adjustment on the 
gain of the lower orders. 

4 Examples 

The software tools described above were applied 
to the derivation of decoders for several real- 
world systems. The examples given here are the 
CCRMA listening room 6 and a tri-rectangle with 

6 See https://ccrma.stanford.edu/room-guides/ 

listening- room / 


the upper and lower loudspeakers at ±30° with 
respect to horizontal. 

4.1 CCRMA Listening Room 

The described software was applied to deriving de¬ 
coders for the Listening Room at CCRMA (Center 
for Computer Research in Music and Acoustics) 
at Stanford. This facility consists of 22 identical 
loudspeakers arranged in five rings. There is a 
horizontal ring of eight loudspeakers, two rings of 
six loudspeakers, one 50° below and one 40° above 
horizontal, and one loudspeaker directly above 
and one directly below the listening position. The 
two hexagonal rings are thus not exactly horizon¬ 
tally opposed. A schematic of the array is shown 
in Figure 4a. 

An initial solution was derived by calculating 
the pseudoinverse of the loudspeaker projection 
matrix as described above. The decoder was mod¬ 
ified to optimize the magnitude of te at high fre¬ 
quencies by applying the weighting factors given 
in Table 2 to the gains of the signal components of 
each order. Given that the theoretical maximum 
average value for te can be no greater than 0.866 
at third order, the average value of 0.850 for the 
third order decoder given here does not leave a 
great deal of margin for improvement. Nonethe¬ 
less, the optimization software was applied to the 
problem. Figure 5 shows the performance of the 
initial solution and the optimized result. Aver¬ 
age te was increased slightly, and maximum di¬ 
rectional error reduced by a factor of 5. 

An informal listening test comparing this de¬ 
coder to the existing one was conducted using 
third-order test signals and studio recordings, as 
well as first-order acoustic recordings. The general 
impression was that the new decoder did a better 
job of keeping horizontal sources in the horizontal 
plane, whereas the existing decoder rendered such 
sources above the horizontal plane. 

4.2 The 30° Tri-Rectangle 

As discussed elsewhere, a dodecahedron or other 
large regular array is difficult to fit into nor¬ 
mally dimensioned spaces. One large array that 
does fit into normal spaces is the so-called tri- 
rectangle^ patterned after a suggestion by Gerzon. 
A schematic is shown in Figure 4b. It consists of 
three interlocking rectangles of loudspeakers, one 
in the horizontal plane, one in the XZ plane, and 
one in the YZ plane. The projection of the loud- 
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(a) The 22-loudspeaker array at CCRMA (b) The 12-loudspeaker tri-rectangle. 

Figure 4: Schematics of the loudspeaker arrays used in the examples. 




(a) The initial solution calculated by pseudoinverse and (b) The optimized solution, 

max-r^ gains. 


Figure 5: rg in the vertical plane for the CCRMA array before and after optimization. The arrows show 
the directional error between the low- and high-frequency matrices. In this case, average te was increased 
slightly, from 0.85 to 0.86 and the maximum directional error reduced by a factor of 5. 


speakers into any plane is an octagon, which hints 
at its utility for reproducing second-order program 
material. However, to enable it to fit into typical 
spaces the vertical rectangles must be squashed to 
an approximate ±30° vertical angle. This gives a 
solid angle of 120° above and below the listening 
position with no loudspeakers. Naturally, this has 
a profound effect on the localization for sources 
above and below the listening position. 


Performance of the initial solution by inversion 
is shown in Figure 6. The magnitude of te is in 
red, with both the horizontal and vertical (in the 
XZ plane) shown. The horizontal shape is essen¬ 
tially circular, with perfect direction (not shown in 
the figure), but the magnitude of te decreases dra¬ 
matically for sources above or below about ±30° 
of elevation. Furthermore, there is an increasing 
error in the direction of te indicating that high- 
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Figure 6: rg in the horizontal and vertical planes 
for the initial second-order decoder. 

frequency sounds will be perceived as coming from 
near the poles. The extreme errors in the direction 
of ve are compounded by the low values, making 
localization in the up and down directions vague 
in any case. 

The large angle subtended by the loudspeaker 
placement with respect to the vertical axis makes 
it impossible to get precise localization for sources 
directly above or below the listening position. It 
may, however, be possible to improve the localiza¬ 
tion for sources near horizontal by correcting the 
direction of te- 

Running the optimizer with this configuration 
as an initial solution resulted in a highly distorted 
solution where the sounds are drawn strongly to 
the loudspeakers. A 3-D plot of te for this so¬ 
lution is shown in Figure 7. As can be seen, the 
performance is very non-uniform (a sphere would 
be ideal) and the maximum angular error is over 
30°. 

Next, a Musil design was attempted. Virtual 
loudspeakers were inserted into the array directly 
above and below the center. This was optimized 
and then the signals for the virtual loudspeakers 
reassigned to the nearest real loudspeakers. This 
resulted in improved te in the horizontal plane, 
as well as elevations as high as ±30°; however, 
it suffers from directional errors as large as 31°. 
Figure 8a shows the optimized solution. 

Finally, a hierarchical design was attempted, 
where each subsequent order is optimized sepa- 


Figure 7: 3-D plot of te from an unconstrained 
optimization of the second-order decoder for the 
30° tri-rectangle. 

rately. This resulted in a slightly lower r^, but 
significantly reduced angular error in the vertical 
plane. Figure 8b shows the optimized solution. 

5 Conclusions 

An open source package for the design of am- 
bisonic decoders has been presented. The soft¬ 
ware allows the derivation of decoders for arbi¬ 
trary loudspeaker arrays, 2-D or 3-D. The soft¬ 
ware operates under Octave or MATLAB, with 
the nonlinear optimization performed by the open 
source package NLOPT. Auditory localization at 
middle and high frequencies is a nonlinear func¬ 
tion of the loudspeaker signals, which necessitates 
the finding of solutions that work well for those 
frequencies via an optimization process. 

Two example systems were solved. The first 
was a third-order decoder for the 22-loudspeaker 
CCRMA listening room. That system is nearly 
regular, and it was found that a solution obtained 
by inversion of the loudspeaker matrix, with per- 
order gains, was nearly as good as one obtained by 
the nonlinear optimization process. Nonetheless, 
the magnitude of te was improved and the angle 
error was reduced. 

The second system was a 12-loudspeaker tri¬ 
rectangle, with the upper and lower loudspeakers 
at 30° above and below the horizontal plane. A 
decoder derived for that system via the technique 
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(a) Musil design. 


(b) Hierarchical design. 


Figure 8: te in the vertical plane for second-order decoders for the 30° tri-rectangle. 


of inversion followed by per-order gains shows high 
magnitudes of te in the horizontal plane but low 
magnitudes in the polar regions and large errors 
in the direction of te- 

Two additional methods were tried in a search 
for a superior solution. The first was the Musil de¬ 
coder in which the array was filled out with virtual 
loudspeakers at the poles and the signals for those 
speakers are routed to the nearest real speakers. 

The second method was a hierarchical one in 
which a solution for each order was established 
consecutively, such that a higher-order decoder 
is also optimum for lower-order program sources. 
This results in a very well behaved decoder, but 
with slightly lower values of ve- 
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A Formulas for maximum average r e 
and per-order gains 

A.l Horizontal Arrays 

For regular horizontal arrays (Table 1), the maxi¬ 
mum value of ue and the gains for each Ambisonic 
order, M, are given by 

ue = largest root of Tm+ i(x) (6) 

9m — Tm (le)? m — 0 ... M (7) 

where T m is the m th Chebyshev polynomial of the 
first kind. In Mathematica, this can be written 
as 7 

Table[ChebyshevT[Range[0, M], 

x /. FindRoot[ChebyshevT[M+l,x], {x,l}]], 
{M, 1, 5}] 

A.2 Periphonic Arrays 

For regular periphonic arrays (Table 2), the maxi¬ 
mum value of ue and the gains for each Ambisonic 
order, M, are given by 

ue — largest root of Pm+i(x ) (8) 

g m = P m (r E ), m = 0...M (9) 

where P m is the m th Legendre polynomial. In 
Mathematica, this can be written as 

Table[LegendreP[Range[0, M] , 

x /. FindRoot[LegendreP[M+l,x] , {x,l}]], 
{M, 1, 5}] 

B LF/HF Matching 

As mentioned in Section 2.4, there are three ap¬ 
proaches to adjusting the g m to match LF/HF 
loudness, g' m = g'o gm- For approach 1, g' Q = 1. 
For approaches 2 and 3, g f 0 is calculated as 

M 

E{g m } = Cm 9m (10) 

m =0 

s'o=EPP in) 

where C m is the number of signals in the m th order 
component. In 3-D, C m = m 2 + 1; in 2-D, C\ — 1 
and C m> i = 2. For approach 2, A is the total 
number of components: in 3-D, (M + l) 2 ; in 2- 
D, 2 M + 1. For approach 3, N is the number of 
loudspeakers in the array. 

7 For those without access to Mathematica, these can 
also be computed interactively using the Wolfram | Alpha 
online service, http://alpha.wolfram.com. 



